Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | في | 26 | ذلك |
2 | من | 27 | كما |
3 | على | 28 | وفي |
4 | إلى | 29 | تم |
5 | ، | 30 | كانت |
6 | أن | 31 | سنة |
7 | التي | 32 | وهو |
8 | أو | 33 | قبل |
9 | عام | 34 | خلال |
10 | عن | 35 | كل |
11 | مع | 36 | قد |
12 | كان | 37 | حتى |
13 | هذه | 38 | عدد |
14 | و | 39 | لم |
15 | هو | 40 | بعض |
16 | هذا | 41 | غير |
17 | الذي | 42 | مثل |
18 | ما | 43 | ثم |
19 | بعد | 44 | أي |
20 | بين | 45 | يمكن |
21 | حيث | 46 | له |
22 | وقد | 47 | وهي |
23 | هي | 48 | أنه |
24 | بن | 49 | أكثر |
25 | لا | 50 | وكان |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges